RNA variant assessment using transactivation and transdifferentiation

Summary Understanding the impact of splicing and nonsense variants on RNA is crucial for the resolution of variant classification as well as their suitability for precision medicine interventions. This is primarily enabled through RNA studies involving transcriptomics followed by targeted assays using RNA isolated from clinically accessible tissues (CATs) such as blood or skin of affected individuals. Insufficient disease gene expression in CATs does however pose a major barrier to RNA based investigations, which we show is relevant to 1,436 Mendelian disease genes. We term these “silent” Mendelian genes (SMGs), the largest portion (36%) of which are associated with neurological disorders. We developed two approaches to induce SMG expression in human dermal fibroblasts (HDFs) to overcome this limitation, including CRISPR-activation-based gene transactivation and fibroblast-to-neuron transdifferentiation. Initial transactivation screens involving 40 SMGs stimulated our development of a highly multiplexed transactivation system culminating in the 6- to 90,000-fold induction of expression of 20/20 (100%) SMGs tested in HDFs. Transdifferentiation of HDFs directly to neurons led to expression of 193/516 (37.4%) of SMGs implicated in neurological disease. The magnitude and isoform diversity of SMG expression following either transactivation or transdifferentiation was comparable to clinically relevant tissues. We apply transdifferentiation and/or gene transactivation combined with short- and long-read RNA sequencing to investigate the impact that variants in USH2A, SCN1A, DMD, and PAK3 have on RNA using HDFs derived from affected individuals. Transactivation and transdifferentiation represent rapid, scalable functional genomic solutions to investigate variants impacting SMGs in the patient cell and genomic context.


OMIM Diseases
2.0) using Minimum Required Sequencing Depth (MRSD) algorithm identified 1436 gene which are not sufficiently expressed in the most frequently clinically accessible tissues of whole blood, blood-derived lymphoblastoid cell lines, or human dermal fibroblasts for the purpose of conducting robust analysis of mRNA splicing using short read RNAseq.These genes are termed silent mendelian genes (SMGs).If a muscle biopsy were available, then this could supply sufficient RNA for analysis of a 166 of the SMGs.(B-D) Top-ranked Gene Ontology Processes Enriched in the 1436 SMGs.Top ranking OMIM diseases (B), Cellular Processes (C), and molecular processes (D) of the 1436 SMGs at p-value <0.05.Gene Ontology performed using ShinyGO 0.77 and ranked based on Fold enrichment and (FDR).pValues are FDR-corrected.
Figure S3.gRNA designs informed through multi-omic assessment.Aggregation of information on isoform expression, transcriptional start site, chromatin accessibility and epigenetic information informs gRNA design.Figure shows and example of multi-omic information utilised in the design of gRNAs for PCDH19.GTEx is first queried to identify isoforms of relevance in the clinically relevant tissue.This identifies relevant 5' ends of transcripts to target, indicted (on this occasion) by the red arrows.Reference transcripts are then located in UCSC browser.The transcriptional start site (TSS) of target gene isoform is extracted from FANTOM5 CAGE data and uploaded as a custom track.This defines the promoter = 500bp region upstream of the TSS (yellow shade).Open chromatin regions (considered most suitable for gRNA placement) identified by ATAC-seq analysis (see Material and Methods) of three human dermal fibroblast lines (HDF1-3) and HEK293T cells are also uploaded as a custom track.Epigenetic feature tracks are also visualised to provide additional information on the promoter state.We utilise chromatin immunoprecipitation sequencing (ChIP-Seq) data derived from the ENCODE project derived from Normal Human Lung Fibroblasts (NHLF) to identify different histone modifications around the promoter of target gene including H3K4Me1 (enriched at active and primed enhancers, and an essential feature of poised chromatin), H3K4Me3 (associated with transcriptionally active or poised chromatin) and H3K27Ac (highly enriched in active enhancers and promoters).Finally, the genomic sequence of the promoter is submitted to the gRNA design tool E-CRISP which returns a series of gRNAs ranked by specificity and uniqueness.These are uploaded to UCSC and four high ranking, nonoverlapping gRNAs are selected with preference (where possible) given to those falling in open and active/poised chromatin regions.Uniqueness of the gRNA sequences is validated by UCSC BLAT tool.To replace the blue fluorescent protein (BFP) tag with a red fluorescent protein (mCherry), the P2A-mCherry fragment was inserted via the cloning sites NotI and XbaI, generating the transgene p.dCas9-ST-mCherry (Note: This version was specifically used for lentiviral delivery into HDFs).To allow Blasticidin selection, the fragment is improved into P2A-mCherry-T2A-BSD and was inserted via the cloning sites, NotI and XhoI, generating the final transgene p.dCas9-ST-mCherry-BSD (Note: This transgene is the specific version used to generate stable HEK293T and BJ-5ta clonal cells).All fragment inserts were synthesised by GenScript and supplied within the pUC57 plasmid backbone.(B) The p.P65-HSF1-NeoR encodes eGFP and neomycin markers, allowing FACs and antibiotic selection.The illustration shows the vector map of p.P65-HSF1-NeoR (gift from Prof. Ryan Lister) which is used to generate stable HEK293T and BJ-5ta clonal cells.The plasmid contains the transactivating domains (TADs) p65 and HSF1 (heat shock factor 1) fused to a single-chain variable fragment (scFv) that recognises the GCN4 epitope, as well as enhanced green fluorescent protein (eGFP) and Neomycin resistance gene, utilised as selection markers.(C and D) Generating HEK293T and BJ-5ta clonal cell lines stably expressing dCas9-ST-PH for gRNA transactivation screen.Diagrams show the experimental pipeline for generating and selecting HEK293T and BJ-5ta clonal cells.Briefly, the dCas9-ST-mCherry-BSD was first delivered to the cells, followed by Blasticidin selection.After selection and expansion, cells were then sorted via fluorescence-activated cell sorting (FACs) to create single cell clones.The resulting dCas9-ST-stable cells are then further modified with the P65-HSF1-NeoR transgene.The cells are then subjected to Geneticin selection and lastly, went through single-cell FACs to select clonal cells with varying fluorescence intensity.Note: For HEK293T, transgenes were first linearised and transfected using standard protocol for Lipofectamine 3000.For BJ-5ta, transgenes were packaged in lentivirus and delivered at MOI 20 (dCas9-ST-mCherry-BSD) and MOI 30 (P65-HSF1-Neo).(E) Screening of dCas9-ST-PH clonal stable cells.To screen for the most potent transactivating dCas9-ST-PH clonal cell lines, a single guide (IL1RN-gRNA4) was delivered to cells (via lipofection for HEK293T and lentiviral delivery for BJ-5ta).mRNA was isolated an IL1RN expression assessed by RT-qPCR.The bar graphs show the activation levels of IL1RN in the 10 different clonal cell lines.Data presented are relative gene expression generated from RT-qPCR with values normalised to ACTB.RT-qPCR data are presented as mean and standard deviation from three technical replicates further normalised to negative control (no-gRNA).Clone 7 and Clone A from HEK293T and BJ-5ta were selected, respectively, for use in gRNA screening experiments (red asterisk).and HDF dCas9-ST-PH cells using a low multiplicity of infection to deliver ~1 gRNA vector per cell.>20,000 cells per cell line were subjected to single cell Perturb-seq using the 10X genomic platform.For many genes, the number of gRNAs per cell is positively associated with target gene expression, and negatively associated with cell number analysed.Data is pooled from all 4 gRNAs per gene.Dark blues lines are number of cells, light blue lines are transcripts per million (TPM), x-axis is gRNA expression.
HEK293T dCas9-ST-PH Cells BJ-5ta HDF dCas9-ST-PH Cells HEK293T dCas9-ST-PH Cells BJ-5ta HDF dCas9-ST-PH Cells Comparison of local splicing events in 2484 broadly expressed neurological disorder genes between clinically relevant tissues (different brain regions from GTEx) and clinically accessible tissues including human dermal fibroblasts, lymphoblastoid cells lines, and whole blood.Analysis conducted using the MAJIQ-CAT tool which reports on the percentage of genes which are not correctly spliced in the clinically accessible tissue using the clinically relevant tissue as a reference.In all comparisons, human dermal fibroblasts displayed the least percentage of incorrectly spliced genes (6-16%), superior to LCLs and blood in which genes were more frequently incorrectly spliced.
Photoreceptor outer segment

Figure S2 .
Figure S2.ClinVar variants associated with 40 SMGs targeted in transactivation screen.Numbers of total variants is given, in addition to the proportion classified as pathogenic (P), likely pathogenic (LP).

Figure S9 .
Figure S9.Pseudo-bulk cell analysis of single cell transcriptomics data generated from the scRNAseq Gene Transactivation Screen.A pooled gRNA expression plasmid library (160 gRNAs; 4 gRNAs per gene, targeting 40 SMGs) was delivered by lentivirus to the HEK293T dCas9-ST-PH and HDF dCas9-ST-PH cells using a low multiplicity of infection to deliver ~1 gRNA vector per cell.>20,000 cells per cell line were subjected to single cell Perturb-seq using the 10X genomic platform.Single cell data was collapsed into a pseudo-bulk cell analyses in which the expression of genes is expressed as the number of reads mapping to a gene as a proportion of the entire number of reads generated in the experiment.(Transcripts per million; TPM).

Figure S10 .
Figure S10.Neurological disorder genes are faithfully spliced in Human Dermal Fibroblasts.Comparison of local splicing events in 2484 broadly expressed neurological disorder genes between clinically relevant tissues (different brain regions from GTEx) and clinically accessible tissues including human dermal fibroblasts, lymphoblastoid cells lines, and whole blood.Analysis conducted using the MAJIQ-CAT tool which reports on the percentage of genes which are not correctly spliced in the clinically accessible tissue using the clinically relevant tissue as a reference.In all comparisons, human dermal fibroblasts displayed the least percentage of incorrectly spliced genes (6-16%), superior to LCLs and blood in which genes were more frequently incorrectly spliced.
. dCas9-ST-PH-4gRNA based transactivation of SMGs in HEK293Ts.A. Engineering of p.P65-HSF1 and 4 gRNA expression cassettes into a single vector.Schematic diagram of p.P65-HSF1 and its cloning sites, KpnI and EcoRI used for the insertion of a cassette containing 4 guide RNA expression cassettes.In the multiplex gRNA expression cassette, each guide was inserted downstream of a different Pol III promoter including the mouse mU6, and human hU6, h7SK, and hH1 promoters.EcoRI and KpnI restriction sites flank the entire cassette to facilitate cloning.The negative control consists of an empty cassette with no gRNAs (encoding only the Pol III promoter and scaffold transactivating CRISPR-RNA aka.tracrRNA).All gRNA expression cassettes were synthesised by GenScript and supplied within the pUC57 plasmid backbone B. A multiplex of four dCas9 proteins can each recruit up to ten copies of p65-HSF1 proteins to the target gene promoter on each allele.Illustration of four guide RNAs targeting different sites upstream of a SMGs' TSS.Each gRNA recruits dCas9-ST scaffold to the binding site, and each dCas9-ST recruits up to ten copies of p65-HSF1 transactivating domains (TADs) through the scFv-GCN4 and SunTag (10 copies of GCN4 epitope) interaction.The TADs act like synthetic transcription factors to promote the assembly of endogenous transcriptional co-regulators and recruit RNA Pol II to the site, culminating in transcription of the endogenous promoter of the target SMG.C. Transactivation of SMGs in HEK293T cells via transient transfection.Diagram shows the experimental pipeline for the transient expression of dCas9-ST-PH-gRNA in HEK293T using Lipofectamine 3000 in a 6-well format.Briefly, the dCas9-ST (1.75µg) and P65-HSF1-gRNA (0.75µg) are co-transfected.One day post-transfection, cells had a fresh media change, and two days post transfection, cells were collected for RNA analysis.D. dCas9-ST-PH-gRNA complex is efficiently co-expressed in HEK293T.Representative images showing the co-expression of green and red fluorescent proteins (eGFP and mCherry) two days after co-transfection of p.dCas9-ST and p.P65-HSF1-gRNA, indicating successful delivery and expression of the transgenes.Scale bar = 100µm.E. dCas9-ST-PH-4gRNA transactivation experiments in HEK293T cells revealed that all SMG tested can be transactivated at varying levels.Bar graph showing the activation of 21 SMGs in HEK293T via transient expression of dCas9-ST-PH-gRNA.Note that for two genes, DMD and MYT1L, two unique sets of guides were assayed (v1 and v2).Data presented are relative gene expression generated from RT-qPCR with values normalised to ACTB.RT-qPCR data are presented as mean and standard deviation from three technical replicates normalised to negative control (dCas9-ST-PH-no-gRNA).F. Transactivation of mRNA expression results in protein production.Western blot shows IL1RN protein in dCas9-ST-PH-gRNA treated HEK293T cells but not in the negative controls, untransduced (no treatment) and no-gRNA (cells treated with dCas9-ST-PH-no-gRNA). IL1RN (mouse-anti-IL1RN, ThermoFisher, #TA803422S), and β-actin (mouse-anti--β-Actin, Sigma-Aldrich, #A2228) proteins at expected molecular weights.Western data is quantified in the graph.

Figure S20 .
Figure S19.Gene Ontology analysis of the 516 silent neurological genes (SNGs).The SNGs were subjected to gene ontology analysis using ShinyGo v8.The highest-ranking GO terms are reported as fold enrichment and coloured based on false discovery rate (-log10FDR).(A) Biological Processes.(B) Cellular Function.(C) Molecular Function.